A comprehensive actuarial risk analysis framework leveraging advanced Natural Language Processing (NLP) techniques on automotive crash data (NMVCCS) and insurance claims data, featuring fine-tuned embedding models and sophisticated topic modeling with BERTopic.
This repository contains a complete end-to-end solution for extracting actionable actuarial insights from unstructured and structured textual data. The project combines traditional actuarial analysis with cutting-edge NLP techniques to identify significant risk patterns and translate them into concrete business intelligence for risk assessment, claims analysis, and underwriting strategies.
Key Value Proposition: - Transform unstructured insurance claims into quantifiable risk metrics - Identify hidden patterns in crash data using advanced topic modeling - Generate demographic risk profiles with statistical validation - Provide actionable underwriting recommendations with premium adjustment strategies
graph TD
A[NMVCCS Data Scraping] --> B[Actuarial Field Extraction]
B --> C[Insurance Claims Processing]
C --> D[GPT Text Augmentation]
D --> E[Embedding Model Fine-tuning]
E --> F[Model Deployment & Evaluation]
F --> G[BERTopic Modeling]
G --> H[Demographic Risk Analysis]
H --> I[Actuarial Insights & Recommendations]
Clone the Repository
git clone https://github.com/manuel.caccone/NLP-Actuarial-Loss-Modeling.git
cd NLP-Actuarial-Loss-ModelingEnvironment Setup
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtGPU Dependencies (Optional but Recommended)
# For cuML acceleration
conda install -c rapidsai -c conda-forge cuml
# Or using pip
pip install cuml-cu11 # For CUDA 11.xEnvironment Configuration
cp .env.example .env
# Edit .env with your OpenAI API key and other configurations# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4
# Data Processing Settings
BATCH_SIZE=32
MAX_WORKERS=4
RATE_LIMIT_REQUESTS=60
# Model Training Parameters
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
FINE_TUNE_EPOCHS=3
LEARNING_RATE=2e-5
# Hugging Face Hub
HF_TOKEN=your_huggingface_token_here
HF_ORGANIZATION=ConsulStat
Explore our comprehensive actuarial risk analysis through an interactive dashboard showcasing: - Real-time Risk Assessment Metrics - Demographic Risk Profiling Visualizations - Topic Modeling Results with BERTopic - Premium Adjustment Recommendations
π Launch Interactive Dashboard